36 research outputs found

    Scalability aspects of data cleaning

    Get PDF
    Data cleaning has become one of the important pre-processing steps for many data science, data analytics, and machine learning applications. According to a survey by Gartner, more than 25% of the critical data in the world's top companies is flawed, which can result in economic losses amounting to trillions of dollars a year. Over the past few decades, several algorithms and tools have been developed to clean data. However, many of these solutions find it difficult to scale, as the amount of data has increased over time. For example, these solutions often involve a quadratic amount of tuple-pair comparisons or generation of all possible column combinations. Both these tasks can take days to finish if the dataset has millions of tuples or a few hundreds of columns, which is usually the case for real-world applications. The data cleaning tasks often have a trade-off between the scalability and the quality of the solution. One can achieve scalability by performing fewer computations, but at the cost of a lower quality solution. Therefore, existing approaches exploit this trade-off when they need to scale to larger datasets, settling for a lower quality solution. Some approaches have considered re-thinking solutions from scratch to achieve scalability and high quality. However, re-designing these solutions from scratch is a daunting task as it would involve systematically analyzing the space of possible optimizations and then tuning the physical implementations for a specific computing framework, data size, and resources. Another component in these solutions that becomes critical with the increasing data size is how this data is stored and fetched. As for smaller datasets, most of it can fit in-memory, so accessing it from a data store is not a bottleneck. However, for large datasets, these solutions need to constantly fetch and write the data to a data store. As observed in this dissertation, data cleaning tasks have a lifecycle-driven data access pattern, which are not suitable for traditional data stores, making these data stores a bottleneck when cleaning large datasets. In this dissertation, we consider scalability as a first-class citizen for data cleaning tasks and propose that the scalable and high-quality solutions can be achieved by adopting the following three principles: 1) by having a new primitive-base re-writing of the existing algorithms that allows for efficient implementations for multiple computing frameworks, 2) by efficiently involving domain expert’s knowledge to reduce computation and improve quality, and 3) by using an adaptive data store that can transform the data layout based on the access pattern. We make contributions towards each of these principles. First, we present a set of primitive operations for discovering constraints from the data. These primitives facilitate re-writing efficient distributed implementations of the existing discovery algorithms. Next, we present a framework involving domain experts, for faster clustering selection for data de-duplication. This framework asks a bounded number of queries to a domain-expert and uses their response to select the best clustering with a high accuracy. Finally, we present an adaptive data store that can change the layout of the data based on the workload's access pattern, hence speeding-up the data cleaning tasks

    EdgeX: Edge Replication for Web Applications

    Get PDF
    Global web applications face the problem of high network latency due to their need to communicate with distant data centers. Many applications use edge networks for caching images, CSS, javascript, and other static content in order to avoid some of this network latency. However, for updates and for anything other than static content, communication with the data center is still required, and can dominate application request latencies. One way to address this problem is to push more of the web application, as well the database on which it depends, from the remote data center towards the edge of the network. This thesis presents preliminary work in this direction. Speci cally, it presents an edge-aware dynamic data replication architecture for relational database systems supporting web applications. The objective is to allow dynamic content to be served from the edge of the network, with low latency

    The acute effect of resistance exercise on serum growth hormone and blood glucose in healthy non-obese adolescent subject

    Get PDF
    Background: The growth hormone (GH) response to resistance training is altered by many factors including sex steroid concentrations, fitness, intensity of exercise, age, gender, duration of exercise and glycemic state but the exact understanding of the interplay of different exercises to GH levels and its induced physiological adaptations is still obscure. This study aimed to see how resistance exercise affects GH levels and its correlation to plasma glucose levels in healthy non-obese adolescent subjects.Methods: 48 healthy non-obese adolescent subjects, 24 males and 24 females were included in the study. High volume exercise training regimen was used which involved major muscle group of arms, legs and trunk. Pre and post exercise levels of serum GH and random blood sugar were estimated in male and female groups.Results: The mean body mass index (BMI) of male and female groups was 23.22±3.12 kg/m2 and 20.40±4.49 kg/m2, respectively. The post-exercise serum GH levels in male and females increased significantly by 0.54±1.041 ng/ml (p<0.05) and 0.85±1.023 ng/ml (p<0.001) respectively. The random blood sugar levels in males after exercise significantly increased (p<0.05) by 7.16±12.61 mg/dl and in females by 6.20±12.09 mg/dl (p<0.05). There was significant correlation (p<0.05) between increase in serum GH levels and increase in random blood sugar levels in both male and female group.Conclusions: Exercise induced increase in GH and its interplay with serum glucose can be better gained access into via metanalytical/elaborate studies of the major hormones and fuels involved.

    Utility of the Epworth sleepiness scale: Hindi version in identifying obstructive sleep apnoea in adult patients with symptoms of sleep disordered breathing in a tertiary care centre

    Get PDF
    Background: Excessive daytime sleepiness is a key symptom in patients with sleep- breathing disorders (SBD) and represents a new major public health issue due to its repercussions. The ESS is a simple and validated method, which measures the probability of falling asleep in a variety of situations. Aims and objectives is to study the accuracy of the Epworth Sleepiness Scale (ESS) questionnaire in the identification of Obstructive Sleep Apnoea (OSA) in patients with symptoms of sleep disordered breathing in a tertiary care centre.Methods: This present study was conducted in the Department of Respiratory medicine, New Medical College, Kota on 70 adult patients who presented with symptoms of Sleep Disordered Breathing and underwent Type 2 Polysomnography after answering Epworth sleepiness score in Hindi Language.Results: Epworth sleepiness scale has predicted excessive day time sleepiness in 60% of study subjects with ESS score more than 10 taken as cut off. Mean value for ESS in the study was 10.78. 35.71% of the patients had severe OSA diagnosed by polysomnography and 30% patients had moderate OSA. Mild OSA was detected in 7.14% patients. Sensitivity of the ESS score >10 in diagnosing OSA was found to be 72.5%. Specificity of the scale was 73.6%.There was significant correlation between ESS score and diagnosis of OSA (p value <0.001).Conclusions: The study concludes that ESS has got good relevance in predicting OSA in patients with sleep disordered breathing

    A study of radiological presentation in bronchogenic carcinoma along with prevelance of pulmonary TB in a tertiary center

    Get PDF
    Background: Lung cancer is most common cause of cancer related death in men and women world wise responsible for over 1 million death annually. Lung cancer is leading cause of cancer death in united states and worldwide. Lung cancer is the most common neoplasm contributing more frequent among males causing cancer related mortality in both sexes. Objective of this study was to radiological presentation in bronchogenic carcinoma along with   prevalence of pulmonary TB in a tertiary center.Methods: Total of 100 patients with histologically proven lung cancer, from July 2018 to June 2019 at a tertiary center Kota Rajasthan. Data of participants regarding demographics, history of smoking habit, clinical presentation, histopathological type, radiographic findings on chest radiograph, ultrasonography, computed tomography (CT) scan, Statistical analysis was performed using descriptive statistics of the collected data.Results: Most common age group of bronchogenic carcinomas was seen between 60-69 years of age (37%) with male predominance (82%).  smoking history present in about (80%) patients.  Most common radiological presentation was a mass lesion present in 91% patients (n=91) followed by unilateral hilar prominence present in 44% of patients (n=44). Other common finding includes mediastinal widening (38%), collapse (26%). pleural effusion (22%), metastasis (22%), cavitation (13%), consolidation (12%), bony erosion (11%), pneumothorax (5%), and pancost tumor (4%).  prevalence of pulmonary TB in bronchogenic carcinoma is 9% and this is due to high burden of pulmonary TB in India.Conclusions: In this study adenocarcinoma was found to be most common type of lung cancer. Smoking is most common risk factor. Pulmonary TB coexistence with bronchogenic carcinoma was more common. The local immunity is deteriorated in cancer cases

    Role of medical thoracoscopy in the management of parapneumonic effusion and empyema thoracic

    Get PDF
    Background: The treatment modality use in early pleural empyema mainly depends on the antimicrobial therapy along with thoracocentesis. In case of complicated empyema this modality does not work and lung not fully expand, until removal of adhesions. The main purpose of the current study is to analyze the experience of management of complicated parapneumonic effusion and empyema thoracic through rigid medical thoracoscopy under local anaesthesia. Aim and objective is to study the role of medical thoracoscopy in the management of empyema thoracic and parapneumonic effusion at tertiary health centre.Methods: This is a descriptive case series study in which 49 patients were recruited, who have clinically and radiologically show empyema thoracic, from department of Respiratory medicine, GMC, Kota, Rajasthan. All patients underwent medical thoracoscopy under local anesthesia. Written Informed consent was taken from the study participants. Ethical approval was obtained from Ethical Review Committee of the hospital. Patients who have HIV and Hbsag positive, those with multiple organ failure and bleeding disorders were excluded.Results: Total 49 patients, out of them 41(84%) were male and 8(16%) were female with mean age 45 years (range 18 to 70 years). Final evolution through chest x-ray revealed complete resolution or successful thoracoscopy done in 37 case of fibrinopurulent (92.50%) and 5 cases of organizing empyema (55.56%). overall success rate 85.71%. Total 7 cases (3 case of fibrinopurulent and 4 cases of organizing empyema) refer to higher center for decortications.Conclusions: Medical Thoracoscopy under local anaesthesia is a safe procedure, efficient and cost-effective intervention for early management of complicated empyema, particularly in early stage of empyema (fibro purulent)

    Effect of Loading Rate on Creep Properties of HgCdTe Epitaxial Films

    Get PDF
    Nanoindentation creep studies were performed on Hg1-xCdxTe (x~0.29) epitaxial films using different loading rates of 0.5 mN.s-1, 1 mN.s-1, 2 mN.s-1 and 4 mN.s-1, keeping a constant peak load of 10 mN. A constant hold time of 20 sec at peak load was maintained for all experiments. The effect of loading rate on creep behaviour of material has been investigated. Creep displacement had shown increasing trend with increase of loading rates. Stress exponents were extracted using creep curve fitting with an empirical equation. A strong dependence of loading rate on stress exponent was observed. The value of stress exponent was found varying in the range 0.60-1.76, 0.96-2.23, 0.98-2,87 and 0.90-2.81 for loading rates 0.5 mN.s-1, 1 mN.s-1, 2 mN.s-1 and 4 mN.s-1, respectively. The change of stress exponent was attributed to change of creep mechanism. Hardness and elastic modulus were extracted from load-displacement curves and it was found that with the increase of the loading rate hardness increases, while elastic modulus remains constant. A correlation between variation of hardness and creep displacement has also been presented

    Verification of Three Phase Full Wave Controlled Rectifier using MATLAB Simulation Model

    Get PDF
    In this paper, the modeling and implements of a three phase full wave controlled rectifier has been modeled on MATLAB SIMULINK software version 7.10.0.449[R2010a]. It is also deals with the simulation analysis of three phase full wave controlled rectifier by obtaining various waveforms. For large power dc loads, three phase ac to dc converters are commonly used. Three phase half wave converter is rarely used because it introduces dc component in the supply current. Some years back ac to dc power conversion was achieved using motor generator sets, mercury arc rectifiers, and thyratorn tubes. The modern ac to dc power converters are designed using high power, high current thyristors and presently most of the ac-dc power converters are thyristorised power converters. This paper also presents that how a full wave controller works for different firing angles at a given time and waveforms were obtain for verification.
    corecore